Rate dependent neural responses of interaural-time-difference cues in fine-structure and envelope

Advancements in cochlear implants (CIs) have led to a significant increase in bilateral CI users, especially among children. Yet, most bilateral CI users do not fully achieve the intended binaural benefit due to potential limitations in signal processing and/or surgical implant positioning. One crucial auditory cue that normal hearing (NH) listeners can benefit from is the interaural time difference (ITD), i.e., the time difference between the arrival of a sound at two ears. The ITD sensitivity is thought to be heavily relying on the effective utilization of temporal fine structure (very rapid oscillations in sound). Unfortunately, most current CIs do not transmit such true fine structure. Nevertheless, bilateral CI users have demonstrated sensitivity to ITD cues delivered through envelope or interaural pulse time differences, i.e., the time gap between the pulses delivered to the two implants. However, their ITD sensitivity is significantly poorer compared to NH individuals, and it further degrades at higher CI stimulation rates, especially when the rate exceeds 300 pulse per second. The overall purpose of this research thread is to improve spatial hearing abilities in bilateral CI users. This study aims to develop electroencephalography (EEG) paradigms that can be used with clinical settings to assess and optimize the delivery of ITD cues, which are crucial for spatial hearing in everyday life. The research objective of this article was to determine the effect of CI stimulation pulse rate on the ITD sensitivity, and to characterize the rate-dependent degradation in ITD perception using EEG measures. To develop protocols for bilateral CI studies, EEG responses were obtained from NH listeners using sinusoidal-amplitude-modulated (SAM) tones and filtered clicks with changes in either fine structure ITD (ITDFS) or envelope ITD (ITDENV). Multiple EEG responses were analyzed, which included the subcortical auditory steady-state responses (ASSRs) and cortical auditory evoked potentials (CAEPs) elicited by stimuli onset, offset, and changes. Results indicated that acoustic change complex (ACC) responses elicited by ITDENV changes were significantly smaller or absent compared to those elicited by ITDFS changes. The ACC morphologies evoked by ITDFS changes were similar to onset and offset CAEPs, although the peak latencies were longest for ACC responses and shortest for offset CAEPs. The high-frequency stimuli clearly elicited subcortical ASSRs, but smaller than those evoked by lower carrier frequency SAM tones. The 40-Hz ASSRs decreased with increasing carrier frequencies. Filtered clicks elicited larger ASSRs compared to high-frequency SAM tones, with the order being 40 > 160 > 80> 320 Hz ASSR for both stimulus types. Wavelet analysis revealed a clear interaction between detectable transient CAEPs and 40-Hz ASSRs in the time-frequency domain for SAM tones with a low carrier frequency.


Psychoacoustic pretest-experiments
Twelve NH participants (S1 -S12: 6 males and 6 females aged 21-42 years old, with a mean age of 27.2 years) took part in the psychoacoustic experiment.In the psychoacoustic experiments, the stimuli were generated digitally using a personal computer running MATLAB (The Mathworks, Natick, MA, USA), then converted to analog form using a Fireface UC sound card (RME Audio, Haimhausen, Germany) with 24-bit resolution and a sampling rate of 48 kHz.Then the stimuli were presented to the participants through Sennheiser HD580 headphones at 70 dB sound pressure level (SPL).The participants were seated in a double-walled soundproof booth and responded by clicking the virtual buttons displayed on a monitor.Three lateralization experiments (5-7 minutes/experiment) were performed to determine the upper frequency limit of left/right discrimination abilities.The experiments used a two-up, onedown, two-alternative forced-choice procedure (2-AFC) to estimate the 71%-correct threshold on the psychometric function (Levitt, 1971).On each trial, two consecutive intervals were presented, separated by 500 ms.Each interval contained four consecutive 400-ms tones or filtered clicks, with 20-ms raised cosine rise/fall ramps, separated by 100 ms.One interval was randomly selected as the standard and had an ITD of the carrier (ITD FS ) or ITD of the envelope (ITD ENV ) of 0. The other interval, the target, had the same first and third tones as the standard, but the second and fourth tones had a non-zero ITD of the same magnitude as each other.During all three experiments, participants were asked to identify which of the two intervals contained a sequence that appeared to change within the head.Experiment 1 was conducted to determine the upper limit of the carrier frequency ( uplim_ ) for fine structure ITD sensitivity by applying an IPD of π/2 to the carrier frequency.The experiment utilized SAM tones with a fixed modulation frequency (  ) of 40 Hz and an adaptive carrier frequency (  ) ranging from 100 Hz to 4000 Hz.The carrier frequency was adjusted using adaptation factors of 1.4, 1.2, and 1.1, starting at 1000 Hz.Before the formal experiment, a brief training task was provided to familiarize the participants with the procedures.After eight reversals, the formal test was terminated and the threshold was calculated as the geometric mean of the last six reversal values.This procedure was adapted from the binaural TFS sensitivity test (TFS-AF) (Füllgrabe et al., 2017;Füllgrabe and Moore, 2017).Experiment 2 was conducted to determine the upper limit of the modulation rate ( uplim_ ) for envelope ITD sensitivity, with a fixed ITD of 500 µs (dichotic) or 0 (diotic) applied to the envelope.The   was fixed at 4000 Hz, while the   was adaptive.Experiment 3 was performed to determine the upper limit of the pulse rate ( uplim_ ) for interaural pulse time difference (IPTD) sensitivity, using an ITD of 500 µs applied to pulses and adjusting the pulse rate.Both experiments were similar to experiment 1, except that in experiments 2 and 3, the start   or pulse rate was 100 Hz or 100 pps, with a minimum and maximum of 10 Hz or pps and 1500 Hz or pps, respectively.The adaptation factors were 80, 40, and 20.To accommodate individual differences, an upper limit for the adaptive   was not set.This may have resulted in some participants perceiving lower sidebands of the modulation as audible for higher modulation rates (e.g., above ~350 Hz) (Kohlrausch et al., 2000), transforming the ITD change detection task into a disparity detection task not only based on envelope ITD cues.The decision not to set an upper limit was made to consider the possibility that participants may use other cues, which could also trigger ACC responses.
() =  (2  )(1 − 2   ) (1) Figure 3A, column 1, shows an example of a stimulus used in the psychoacoustic experiment 1 (SAM tones ITD FS ,   = 1000 Hz and   = 40 Hz).In this example, the first interval is the target (row 2), and the second interval is the standard (row 3).Both the psychoacoustic and EEG experiments employed IPD of 0 or π/2.EEG experiment 1 tested four carrier frequencies,   = [400, 800, 1200, 1600] Hz. Figure 3B, column 1, shows an example of a stimulus used in EEG experiment 1 (SAM tones ITD FS ,   = 400 Hz and   = 40 Hz), where each presentation lasted 8 seconds (s).The sequence included 2 s of the diotic stimulus (IPD = 0 in time window T1), followed by 2 s of the dichotic stimulus (IPD = π/2 in time window T2; T1T2 referred to as outward switching), then 2 s of the standard stimulus (IPD = 0 in time window T3; T2T3 referred to as inward switching), and 2 s of silence (in time window T4).The stimuli used in the second psychoacoustic experiment (not shown in Figure 3A) were also SAM tones, but the ITD was applied to the envelope instead of the carrier.In EEG experiment 2, four modulation frequencies were tested   = [40, 80, 160, 320] Hz. Figure 3B, column 2 shows an example of a stimulus used in this experiment (SAM tones ITD ENV ,   = 4000 Hz and   = 40 Hz).Like in EEG experiment 1, it consisted of 2 s of the diotic stimulus (ITD ENV = 0 in the time window T1), followed by 2 s of the dichotic stimulus (ITD ENV = 500 µs in the time window T2; with an outward switching T1T2), then again 2 s of the standard stimulus (ITD ENV = 0 in the time window T3; with an inward switching T2T3), and 2 s of silence (in the time window T4).As Ross (2018) showed no detectable ACCs in most of their participants for the 4000 Hz SAM, the ACCs in experiment 2 might be smaller than in experiment 1 or absent.Note that, as in (Kohlrausch et al., 2000), no precautions were taken to mask possible distortion products in both psychoacoustic and the EEG experiment 2.

Filtered clicks
In experiment 3, filtered clicks generated as in (Hu et al., 2017;Hu et al., 2022) were used to simulate the signal delivered to CI users.The pulse train was band-limited to 3-5 kHz with a center frequency of   = 4 kHz.These band-limited pulse trains () were then sinusoidally amplitude-modulated using formula (2).
(2) The   in the psychoacoustic was 2.5 Hz (reciprocal of the duration of consecutive filtered clicks, i.e. 1/0.4s), while it was 10 Hz in EEG experiments.This type of SAM ensures that stimuli start at the trough of the modulation.Figure 3A (column 2) and B (column 3) show an example of the stimuli used in the psychoacoustic (filtered clicks IPTD, pulse rate = 160 pps and   = 2.5 Hz, IPTD = 0 or IPTD = 500 µs) and EEG experiments (filtered clicks IPTD, pulse rate = 160 pps and   = 10 Hz, IPTD = 0 or IPTD = 500 µs), respectively.In the EEG experiment, four fixed pulse rates of [40,80,160,320] pps were used.The duration of each presentation is 6 s, which includes 2 s of the diotic stimulus, followed by 2 s of the dichotic stimulus (with a transition from T1T2, referred as outward switching), and 2 s of silence (in time window T4, with a T2T3 inward switching).In both the psychoacoustics and EEG experiment 3, a low-pass noise, uncorrelated between the ears, was added to the filtered clicks to conceal potential distortion products.The low-pass noise was created by generating broadband noise in the time domain, converting it to the frequency domain, and setting the power of all components above 1000 Hz to zero.The noise was then manipulated to have a flat spectrum up to 200 Hz with a decreasing spectral density of 3 dB/octave above 200 Hz.It was further filtered with a 5th-order, lowpass filter with a cut-off frequency of 1000 Hz (Hu et al., 2017), and gated with 50-ms raised cosine ramps.The test stimulus was centered within the noise presentation, which was presented at 40 dB SPL.We chose 4000 Hz instead of a higher carrier frequency such as 8000 Hz for several reasons: Firstly, Previous studies have shown that the upper modulation rate is lower for stimuli centered at 8000 Hz compared to those centered at 4000 Hz (Bernstein and Trahiotis, 2013).Since only 2 out of 14 participants in Ross (2018) showed significant responses at 4000 Hz, we would expect similar or even smaller responses at 8000 Hz.Secondly, as the aging population is one target group for future studies, high-frequency hearing loss may make a higher frequency less optimal.Lastly, although it is not a critical factor, 8000 Hz is less pleasant to listen to than 4000 Hz.

Psychoacoustic pretest results
Supplementary Figure 1 shows the violin plots of the  uplim_ ,  uplim_ , and  uplim_ from three psychoacoustic experiments.The violin plots (Hintze and Nelson, 1998) were generated using freely available Matlab code (https://github.com/bastibe/Violinplot-Matlab).The original box plot shape is included as a grey box in the center of the violin.Supplementary Figure 1 depicts the individual data of the 12 participants as solid blue dots that have been randomly jittered from the center.The corresponding density curves have been constructed around each center line.If the participant couldn't do the task, the value was set to 0.123456.Supplementary Figure 1 indicates that the upper limits vary across participants.The mean and standard deviation of  uplim_ is 1393 ± 284 Hz, which is in the range of previously reported values (Ross et al., 2007a;Ross et al., 2007b;Grose and Mamo, 2010;Hopkins andMoore, 2010, 2011;Brughera et al., 2013;Füllgrabe and Moore, 2017;Papesh et al., 2017;Füllgrabe and Moore, 2018).It should be noted that the top-performing participant in this study exhibited a higher  uplim_ than those reported in (Klug and Dietz, 2022), possibly due to the utilization of different stimuli and test procedures.The purpose of the pretests is to select the rate conditions for in the EEG experiments.The exactly upper frequency limit in humans is not the focus of this study, and more detailed discussions are beyond the scope of this paper.
Supplementary Figure 1 the top panels of the figure show violin plots of the upper limit frequency   obtained from the psychoacoustic experiment for each participant, represented by solid dots in each violin plot.The bottom panels display the correlation between the three upper limit frequencies.Participants S9-S12 (represented by pentagram symbols) were unable to attend the EEG experiment, while S3 couldn't achieve  uplim_ and S5 couldn't achieve  uplim_ , represented by diamond symbols.The dotted red lines in panels B, D, and F indicate the boundary of 350 Hz.
The top middle panel of Supplementary Figure 1 shows the  uplim_ .Without setting a limit in the adaptive procedure, some participants reached  uplim_ above 350 Hz.Participant S1 even reached 980 Hz.This was expected because the task may become easier again for some participants if they are able to use spectrally resolve sidebands at modulation rates above a certain frequency (e.g., ~ 350 Hz, the red dotted horizontal line) (Kohlrausch et al., 2000).This phenomenon may be more prominent in the disparity detection test procedure used in this study, compared to the classical left/right discrimination tasks.To avoid misleading interpretation, the mean and standard deviation of  uplim_ were calculated after excluding data from participants who couldn't complete the task (S3) and those with  uplim_ above 350 Hz (6 data points as indicated by the empty circles in the upper middle panel of Supplementary Figure 1, which may be a result of the resolved sidebands).The resulting mean and standard deviation were 207 ± 99 Hz.Some caution is necessary when interpreting the correlation between the  uplim_ and other experimental results.However, the same issue was not apparent for the EEG results shown in Section 3.2, because the maximum modulation frequency tested in the EEG experiment was limited to 320 Hz.The mean and standard deviation of  uplim_ for filtered clicks, after excluding S5 and S10, were 207 ± 97 pps.The Pearson correlation coefficients between the three upper limits are as follows: between  uplim_ and  uplim_ (exclude S3), r = 0.63, p = 0.04; between  uplim_ and  uplim_ (excluded S5 and S10), r = 0.63, p = 0.05; and between  uplim_ and  uplim_ (excluded S3, S5, and S10), r = 0.37, p = 0.33.S3 and S5 were unable to perform the corresponding experiments.However, both detected changes when presented with 100 Hz SAM tones and 100 pps filtered clicks with 500 μs ITD.It was speculated that this was mainly due to the large initial adaptive stepsize and their difficulty in focusing during the first reversal.S8 reported that he occasionally experienced mild tinnitus.Despite this, he was included in the EEG experiment as his audiometry results were within normal hearing range and his lateralization performance was above average.Regrettably, participants S9-S12 were unable to attend the EEG experiments due to reasons relating to the COVID-19 pandemic.

Time domain (CAEPs)  CAEPs of experiment 1
Regarding experiment 1 (ITD FS , pape Figure 4A and Figure 6A), the amplitude and latency of the offset responses (aqua) were relatively consistent across different carrier frequencies and the N1 latency of the offset responses was generally shorter compared to the onset and ACC responses.The N1P2 amplitude was significantly affected by carrier frequency (  ), response type, and their interaction according to GLMrm (p<0.005).The mean amplitude was 7.318/6.063/5.094/4.023µV for 400/800/1200/1600 Hz, respectively.There were no significant differences between 400, 800, and 1200 Hz, but the N1P2 amplitude for the 1600 Hz was significantly smaller than the other carrier frequencies.For the offset CAEPs, there were no significant differences between carrier frequencies.For most of the onset CAEPs, the differences were not significant except that the N1P2 amplitude of 1200 Hz was slightly larger than that of 1600 Hz (p = 0.048).For ACC1 (outward) responses, the N1P2 amplitudes of 400 Hz and 800 Hz were significantly larger than the 1200 and 1600 Hz, but there were no significant differences between 400 and 800 Hz, and between 1200 Hz and 1600 Hz.For ACC2 (inward) responses, the N1P2 amplitude of 800 Hz was significant larger than that of 1200 and 1600 Hz, and the N1P2 amplitude of 400 Hz was significantly larger than that of 1600 Hz.The mean N1P2 amplitudes were 8.577/4.816/4.172/4.932µV for onset/ACC1/ACC2/offset responses, respectively.The onset CAEPs were significantly larger than the ACC1, ACC2, and offset CAEPs.However, there were no significant differences between the three latter types.Pairwise comparisons within each   showed that the onset CAEPs were significantly larger than the offset CAEPs only for 400 and 1200 Hz.There were no significant differences between ACC1 and ACC2 for all carrier frequencies.Significant correlations were observed between the onset N1P2 amplitudes of most carrier frequencies, except for 400 vs 1200 Hz, and 400 vs 1600 Hz.The mean N1 latency was 114/132/137/95 ms for onset/ACC1/ACC2/offset, respectively.The GLMrm analysis showed a significant effect of response type (p<0.001),but no significant effect of   and their interaction.Pairwise comparisons showed significant differences between most response types (p<0.01),except between ACC1 and ACC2.The N1 latency of ACC responses was significantly larger than the onset response, while the offset response had the shortest latency and was significantly smaller than the other response types.In summary, the results from experiment 1 were generally consistent with Ross et al. (2007b).For example, the mean P1, N1, and P2 amplitudes of ACC were smaller than those of the onset response: P1, 1.684/1.405/1.005/0.248µV; N1, 3.324/-1.299/-1.019/-2.146µV; P2, 3.946/2.128/1.862/2.379µV for onset/acc1/acc2/offset.The ACC latencies were delayed compared with the corresponding onset and offset ones: P1, 42/46/57/27 ms; N1, 114/132/137/95 ms; P2, 211/227/240/213 ms for the onset/ACC1/ACC2/offset.The mean latencies of both P1 and N1 were in the same range but slightly smaller than Ross et al. (2007a).The latencies of ITD FS change evoked ACC1 and ACC2 that were longer than the onset, and the differences were smaller than Ross et al. (2007a).Consistent with (Ross, 2018), there was a tendency for larger responses to outward IPD changes (ACC1) than inward changes (ACC2) for the lower carrier frequencies, however, it was not significant here (p>0.5).

 CAEPs of experiment 2
In experiment 2 ( ITD ENV ), as demonstrated in paper Figure 4B and Figure 6B, similar to experiment 1, there were clear onset and offset responses in all four test conditions.The onset N1P2 amplitude was comparatively larger than the offset responses, but the difference between the onset and offset CAEPs was smaller compared to those shown in paper Figure 4A and Figure 6A.Consistent with the findings of Ross (2018), the N1P2 amplitudes of both onset and offset CAEPs were larger than the ACC responses, due to the tiny (close to the noise floor) or absence of ACC responses.Regarding the N1P2 amplitude in experiment 2, a GLMrm analysis revealed significant effects of   , response type, and their interaction.The mean amplitude was 4.249/4.228/5.258/5.665µV for   of 40/80/160/320 Hz, respectively.A significant difference between   was only observed for 80 Hz vs 160 Hz.Consistent with this, pairwise comparisons within each response type only showed a just significant smaller onset N1P2 amplitude in 80 Hz condition compared to the 160 Hz condition (p = 0.048).The mean amplitude was 8.547/2.543/1.853/6.458µV for onset/ACC1/ACC2/offset, respectively.Both onset and offset CAEPs were larger than the ACC responses.There were no significant differences between ACC1 and ACC2, and between onset and offset.Within each   , pairwise comparisons also showed no significant difference between onset and offset CAEPs, and between ACC1 and ACC2 responses (near noise floor).The onset and offset CAEPs were significantly larger than ACC responses, except for the comparison between ACC1 and offset for   = 80 Hz, and between ACC1 and offset, and ACC2 and offset for   = 160 Hz.Significant correlations were observed among modulation frequencies for all onset N1P2 amplitudes and for most offset N1P2 amplitudes, except for 320 vs 40, and 320 vs 160 Hz.The offset CAEPs were more correlated with the ACC responses than with the onset responses, mainly due to their small amplitudes.For N1 latency, the GLMrm analysis showed no significant effect of either   or response type.The mean latency was 107/109/107/101 ms for onset/ACC1/ACC2/offset, and 107/105/104/109 ms for 40/80/160/320 Hz.

 CAEPs of experiment 3
Regarding the filtered clicks (paper Figure 4C and Figure 6C), there were no ACC2 responses (for inward IPTD changes) recorded in experiment 3. Overall, the N1P2 amplitude of both onset and offset responses increased with increasing pulse rates.Similarly to experiment 2, the ACC1 responses were either small (near the noise floor) or absent.For N1P2 amplitude, GLMrm showed a significant effect of pulse rate, response type, and their interactions (p<0.01).The mean amplitude was 2.31/3.43/5.36/5.35µV for 40/80/160/320 pps, respectively.There were no significant differences between pulse rates of 40 and 80 pps, and between 160 and 320 pps.Within each response type, pairwise comparisons showed no significant differences between pulse rates for both ACC1 and offset responses.For the onset CAEPs, there were significant differences between most pulse rates (p <0.01), except for conditions of 40 vs 80 pps, and 160 vs 320 pps.The mean amplitude was 5.48/2.52/4.34µV for onset/ACC1/offset, and only the difference between onset and ACC1 responses was significant.Within each pulse rate, pairwise comparisons showed no significant differences between response types for most pulse rates, except that for 160 pps and 320 pps, there was a significantly larger onset N1P2 amplitude than the offset one (p=0.009,and p = 0.014).There were no correlations between N1P2 amplitudes of different pulse rates for both onset and offset responses.For N1 latency, GLMrm revealed a significant effect of pulse rate, but not of response types or their interactions.The mean latency was 141/134/119/114 ms for 40/80/160/320 pps, respectively.The N1 latency was significantly shorter for 320 pps compared to 40 pps and 80 pps, and for 160 pps compared to 80 pps.Within each response type, pairwise comparison showed significant N1 latency differences only for 320 pps vs 40 pps, 320 pps vs 80, and 80 pps vs 160 pps for the onset CAEPs, and between 40 pps and 320 pps (p = 0.044) for ACC1.The mean latency was 131/129/120 ms for onset/ACC1/offset responses with no significant differences between them.Within each pulse rate, there were nearly no significant differences between the three response types, except that the onset N1 latency was significantly larger than the offset one (p = 0.011) for the 80 pps.

 Comparison CAEPs evoked with 40 Hz modulation rate stimuli
In experiment 3, we did not measure ACC2 data (inwards changes) for the filtered clicks, so only the onset, ACC1 (outwards changes), and offset responses of the five types of 40 Hz modulated SAM tones and the 40-pps filtered clicks were analyzed using GLMrm (with factors: stimuli type [400/800/1200/1600/4000SAM/40-pps-clicks], and response type [onset,ACC1 and offset]).GLMrm showed a significant effect (p<0.005) of stimulus type, response type, and their interaction on the N1P2 amplitude.The mean amplitude was 7.544/6.271/5.860/4.759/5.046/2.309µV for 400/800/1200/1600/4000SAM/40-pps-clicks, respectively.Pairwise comparison revealed significant differences between 400 Hz and 1600 Hz SAM tones, and between 40-pps filtered clicks and all four low-frequency SAM tones.Within each response type, the pairwise comparison showed that: 1) the onset response amplitude of the 40-pps filtered clicks was significantly smaller than most SAM tones except for the 800 Hz; 2) the 400 and 800 Hz SAM tones evoked significant larger responses than the other three SAM tones and the 40-pps filtered clicks for ACC1 response; 3) there were no significant differences among different stimulus types for the offset responses.The GLMrm analysis revealed a significant effect of stimulus type, response type, as well as their interaction on N1 latency.The mean latency was 110/112/114/120/104/141ms for 400/800/1200/1600/4000SAM/40-pps-clicks.
Pairwise comparison showed significant differences in latency between 1600 Hz and 4000 Hz SAM tones, and between 40-pps filtered clicks and the four low-frequency SAM tones.Further analysis within each response type showed that for onset responses, the 40-pps filtered clicks had a significantly different latency from most other stimulus types, except for the 1600 Hz SAM tones (p = 0.051).Within each stimulus type, there were no significant differences in latency among the three response types for both 4000 Hz SAM tones and 40-pps filtered clicks.